-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config for timeouts and delays in probes #11458
Config for timeouts and delays in probes #11458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jan-kantert!
The default probe timeouts are 1s, so that's the current behavior. Changing the defaults from 1s to 30s and 5s respectively feels like a big change. I'm not sure how we would determine if these defaults are better.
What do you think about leaving the defaults as the current value of 1s? At least this change will allow users to configure these timeouts explicitly without changing the default behavior.
Sure we can do that. However, those small timeouts cause instability on clusters with high load. I personally would still at least increase the default for the livenessProbe (as Kubernetes recommends to have a longer timeout for your livenessProbe than your readinessProbe). This timeout also has been introduced more or less silently with kubernetes 1.20 (https://kubernetes.io/blog/2020/12/08/kubernetes-1-20-release-announcement/#exec-probe-timeout-handling). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jan-kantert 👍
Please complete the Proxy
struct in ./pkg/charts/linkerd2/values.go
with these new entries for the linkerd install
CLI to be able to render them.
Also, please run ./bin/helm-docs
for the comments added into the values.yaml
file to be picked up in the chart's README.md file, and go test ./... -update
to update the unit test golden files.
Hi @jan-kantert, just checking in to see if you have time to make the above changes? |
Hey @jan-kantert, just to give you an update: I pushed the requested changes to your branch. I did not change anything in the implementation (but I did modify the file to resolve a merge conflict) -- most of my work has been to fix the tests and get this reviewable. Your effort is super appreciated! I think due to the nature of the issue itself (that you have described in #11453) we felt that it would be better to get this fix in soon so we can get it tested asap. The contribution is solely yours :) |
charts/partials/templates/_proxy.tpl
Outdated
initialDelaySeconds: {{.Values.proxy.livenessProbe.initialDelaySeconds | default 10}} | ||
timeoutSeconds: {{.Values.proxy.livenessProbe.timeoutSeconds | default 1}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get rid of the default
statements, as we already have defaults in the values.yaml
file, and to be consistent with the rest of the file (see startupProbe
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I did not want to touch the impl but I agree.
@@ -198,6 +198,14 @@ proxy: | |||
# "all-unauthenticated", "cluster-authenticated", "cluster-unauthenticated", "deny" | |||
# @default -- "all-unauthenticated" | |||
defaultInboundPolicy: "all-unauthenticated" | |||
# -- LivenessProbe timeout and delay configuration | |||
livenessProbe: | |||
initialDelaySeconds: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the default here used to be zero? Setting the new default to 10 is going to significantly delay the startup time for pods. How about leaving the default as it was? Pathological cases would still have an escape hatch with this new setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was hardcoded in to 10, see the _proxy.tpl change. The default did not change. The only defaults we have changed are timeouts since we've added them in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😊
@jan-kantert we've noticed your DCO check wasn't properly signed. We can take this across the finish line if you'd like, but we still need to have all of the commits signed to pass the check :) |
@jan-kantert Hey! We'd like to take this, if you can confirm that you agree with the whole DCO thing. A comment on this issue is fine. 🙂 |
I agree to the DCO! Sorry that I did not get to it before Christmas. Thanks for taking care! |
Add timeout config for readinessProbe and livenessProbes
Add defaults for timeouts and delays
Changed to previous default
changed to previous default
Signed-off-by: Matei David <[email protected]>
Signed-off-by: Matei David <[email protected]>
08d7a34
to
c39b6db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rebased and pushed some minor changes. Good to go IMO 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a way to configure probe initialDelay and timeout. Do we also anticipate needing to configure probe period? Should we just add that as well while we're here? Worth a thought but not a blocker.
This release addresses some issues in the destination service that could cause it to behave unexpectedly when processing updates. * Fixed a race condition in the destination service that could cause panics under very specific conditions ([#12022]; fixes [#12010]) * Changed how updates to a `Server` selector are handled in the destination service. When a `Server` that marks a port as opaque no longer selects a resource, the resource's opaqueness will reverted to default settings ([#12031]; fixes [#11995]) * Introduced Helm configuration values for liveness and readiness probe timeouts and delays ([#11458]; fixes [#11453]) (thanks @jan-kantert!) [#12010]: #12010 [#12022]: #12022 [#11995]: #11995 [#12031]: #12031 [#11453]: #11453 [#11458]: #11458 Signed-off-by: Matei David <[email protected]>
Proposed change to implement #11453